Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration
نویسندگان
چکیده
Recent advances in multi-million-gate platform FPGAs have made it possible to design and implement complex parallel systems on a programmable chip (PSOPCs) that also incorporate hardware floating-point units (FPUs). These options take advantage of resource reconfiguration. In contrast to the majority of the FPGA community that still employs reconfigurable logic to develop algorithm-specific circuitry, our FPGA-based mixed-mode reconfigurable computing machine can implement simultaneously a variety of parallel execution modes and is also user programmable. Our HERA (HEterogeneous Reconfigurable Architecture) machine can implement the SIMD (Single-Instruction, Multiple-Data), MIMD (Multiple-Instruction, Multiple-Data) and M-SIMD (Multiple-SIMD) execution modes. Each processing element (PE) is centered on a single-precision IEEE 754 FPU with tightly-coupled local memory, and supports dynamic switching between SIMD and MIMD at runtime. Mixed-mode parallelism has the potential to best match the characteristics of all subtasks in applications, thus resulting in sustained high performance. We evaluate HERA’s performance by two common computation-intensive testbenches: matrix-matrix multiplication (MMM) and LU factorization of sparse Doubly-Bordered-Block-Diagonal (DBBD) matrices. Experimental results with electrical power network matrices show that the mixed-mode scheduling for LU factorization can result in speedups of about 19% and 15.5% compared to the SIMD and MIMD implementations, respectively. _______________________________ *This work was supported in part by the U.S. Department of Energy under grant DE-FG02-03CH11171.
منابع مشابه
Mixed-Mode Scheduling for Parallel LU Factorization of Sparse Matrices on the Reconfigurable HERA Computer
HERA (HEterogeneous Reconfigurable Architecture) is an FPGA-based mixed-mode reconfigurable computing system that we have designed and implemented for the simultaneous execution of a variety of parallel processing modes. These modes are SIMD (Single-Instruction, Multiple-Data), MIMD (Multiple-Instruction, MultipleData) and M-SIMD (Multiple-SIMD). Each processing element (PE) is centered on a si...
متن کاملLimitations Imposed on Mixed-Mode Performance of Optimized Phases Due to Temporal Juxtaposition
Mixed-mode parallel processing systems are capable of executing in either SIMD (synchronous) or MIMD (asynchronous) mode of parallelism. The ability to switch between the two modes at instruction-level granularity with very little overhead allows the parallelism mode to vary for each portion of an algorithm. To fully exploit the capability of intermixing both SIMD and MIMD operations within a s...
متن کاملImpact of Temporal Juxtaposition on the Isolated Phase Optimization Approach to Mapping an Algorithm to Mixed-Mode Architectures
~ Mixed-mode parallel processing systems are capa¬ ble of executing in either SIMD (synchronous) or MIMD (asyn¬ chronous) modes of parallelism. The ability to switch between the two modes at instruction level granularity with very little overhead allows the parallelism mode to vary for each portion of an algorithm. To fully exploit the capability of intermixing both SIMD and MIMD operations wit...
متن کاملTemporal Specifications of Component Based Systems with Polymorphic Dynamic Reconfiguration
In this chapter, we present a formal characterisation of component based systems with support for polymorphic dynamic reconfiguration. By dynamic reconfiguration we mean, as usual, changes in the system architecture at run time. By polymorphic reconfiguration we mean that reconfiguration operations may concern different types of components or connections, exploiting an inheritance relationship ...
متن کاملCustomising Graphics Applications: Techniques and Programming Interface
This paper identifies opportunities for customising architectures for graphics applications, such as infrared simulation and geometric visualisation. We have studied methods for exploiting custom data formats and datapath widths, and for optimising graphics operations such as texture mapping and hidden-surface removal. Techniques for balancing the graphics pipeline and for runtime reconfigurati...
متن کامل